Gene Expression — Query Gene Expression Across Tissues

Gene Expression is a tool that allows users to query the expression of any gene across all data in CELLxGENE Discover. A query results in a dot plot per tissue as explained below.

How to Interpret a Gene Expression Dot Plot

Dot Plot Basics

A dot plot can reveal gross differences in expression patterns across cell types and highlights genes that are moderately or highly expressed in certain cell types.

Dot plots visualize values across two dimensions: color and size (Figure 1). The color of the dot approximates average gene expression. Its size represents the percentage of cells within each cell type that expresses the gene.

image

Figure 1. Two metrics are represented in gene expression dot plots, gene expression and percentage of expressing cells.

The combination of these metrics in a grid of genes by cell types enables you to assess gene expression (Figure 2).

Be aware that genes that are lowly expressed or expressed in a small percentage of cells may be difficult to visually identify in a dot plot. This is particularly important for certain marker genes which are specifically but lowly expressed in their target cell types, for example transcription factors and cell-surface receptors.

image

Figure 2. Example of how to interpret the dot plot.

How to Make Sense of Normalized Values

The data used to create the dot plot is normalized with a log transformation of scaled pseudocounts (ln(CPTT+1)) and then averaged (see "Gene Expression Data Processing" section for details).

There are two color scales available: scaled and unscaled. The unscaled color map is fixed to a minimum value of 0 and a maximum value of 8; these are comparable across dot plots. The scaled color map is responsive to the data currently in view, and assigns the minimum value in view to 0 and the maximum value in view to 1; these are not comparable across dot plots.

image

Figure 3. Examples of high, medium and low expression.

The examples in Figure 3 have a relatively constant percentage of cells expressing a gene (dot size), however to identify highly expressed genes the user is advised to pay attention to both the color intensity and the size of the dot.

How to Navigate Cell Types

Cell types in the dot plot (rows) are ordered by default with a heuristic algorithm that tries to preserve relationships in the Cell Type ontology (CL).

The expressions and cell counts of parent cell type terms are supersets of child terms. In other words, the expression of a gene in a parent cell type includes the expression of that gene in all its descendant cell types.

Caveats of Normalization

Given that the data are normalized and concatenated, but not integrated, there may still be significant batch effects present in this data. While normalization and aggregation (taking the mean expression across many cells) somewhat mitigates these artifacts, caution is advised when examining subtle differences in the dot plot across cell types. See our manuscript for a detailed analysis.

Users interested in evaluating the pre-normalized absolute expression data can access it through our CELLxGENE census API.